RELAXATION-BASED COARSENING AND 
MULTISCALE GRAPH ORGANIZATION 
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Abstract. In this paper we generalize and improve the multiscale organization of graphs by 
introducing a new measure that quantifies the "closeness" between two nodes. The calculation of 
the measure is linear in the number of edges in the graph and involves just a small number of 
relaxation sweeps. A similar notion of distance is then calculated and used at each coarser level. 
We demonstrate the use of this measure in multiscale methods for several important combinatorial 
optimization problems and discuss the multiscale graph organization. 

1. Introduction. A general approach for solving many large-scale graph prob- 
lems, as well as most other classes of large-scale computational science problems, is 
through multilevel (multiscale, multircsolution, etc.) algorithms. This approach gen- 
erally involves coarsening the problem, producing from it a sequence of progressively 
coarser levels (smaller, hence simpler, related problems), then recursively using the 
(approximate) solution of each coarse problem to provide an initial approximation to 
the solution at the next-finer level. At each level, this initial approximation is first 
improved by what we generally call "local processing" (LP). This is an inexpensive 
sequence of short steps, each involving only a few unknowns, together covering all 
unknowns of that level several times over. The usual examples of LP are few sweeps 
of classical (e.g., Gauss-Seidel or Jacobi) relaxation in the case of solving a system of 
equations, or a few Monte Carlo passes in statistical-physics simulations. Following 
the LP, the resulting approximation may be further improved by one or several cy- 
cles, each using again a coarser-level approximation followed by LP, applying them at 
each time to the residual problem (the problem of calculating the error in the current 
approximation). See, for example, references [6, 7, If , 12, 13, 14, 38, 42]. 

At each level of coarsening one needs to define the set of coarse unknown variables 
and the equations (or the stochastic relations) that they should satisfy (or the energy 
that they should minimize). Each coarse unknown is defined in terms of the next- 
finer- level unknowns (defined, not calculated: they are all unknowns until the coarse 
level is approximately solved and the fine level is interpolated from that solution). 
The following are examples: 

• The set of coarse unknowns can simply represent a chosen subset of the fine- 
level set. 

• If the fine-level variables are real numbers or vectors, each coarse variable can 
represent a weighted average of several of them. 

• If the fine-level variables are Ising spins (having only values of +1 or — 1), 
each coarse variable can again be an Ising spin, representing the sign of the 
sum of several fine spins. 

• A coarse variable can be defined from several fine variables by a stochastic 
process ([5], for example). 

• In the case of graph problems, each node of the coarse graph can represent 
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an aggregate of several fine-level nodes or a weighted aggregate of such nodes, 
that is, allowing each fine-level node to be split between several aggregates. 

The choice of an adequate local processing at a fine level and the choice of an 
adequate set of variables at the next-coarser level are strongly coupled. The general 
guiding rule [10] is that this pair of choices is good if (and to the extent that) a fine- 
level solution can always be recovered from the corresponding set of coarse variables 
by a short iterative use of a suitably modified version of the LP. That version is called 
compatible LP (CLP). Examples are compatible Monte Carlo (CMC), introduced in 
[13], and compatible relaxation (CR), introduced in [8]. 

The CLP, needed in several important upscaling procedures (such as the selec- 
tion of the coarse variables, the acceleration of the fine- level simulations, and the 
processing of fine-level windows within coarse simulations; see [10]) can also be used 
for performing the interpolation from the coarse solution to obtain the first approxi- 
mation at the fine level. When possible, however, the construction of a more explicit 
interpolation is desired in order to apply it for the direct formulation of equations (or 
an energy functional) that should govern the coarse level, as in Galerkin coarsening. 

In the process of defining the set of coarse variables and in constructing an explicit 
interpolation, it is important to know how "close" two given fine-level variables are 
to each other at the stage of switching to the coarse level. We need to know, in other 
words, to what extent the value after the LP of one variable implies the value of the 
other. If they are sufficiently close, they can, for example, be aggregated to form a 
coarse variable. 

The central issue addressed in the present article is how to measure this "close- 
ness" between two variables in a system of equations or between two nodes in a given 
graph. (We consider the latter to be a special case of the former, by associating the 
graph with the system formed by its Laplacian.) More generally, we want to define 
the distance of one variable from a small subset S of several variables, in order to 
measure how well Xi can be interpolated from S following the LP. 

In classical Algebraic Multigrid (AMG), aimed at solving the linear system 

n 

Ax = b or ~^^aijXj=bi , (i = l,...,n) , (1-1) 

j'=i 

the closeness of two unknowns Xi and Xj is measured simply by the relative size of 
their coupling ay, for example, by the quantity 

|oij|/max(^ \a lk \,Y^ Kjl) (1-2) 

k k 

(or similarly by the relative size of their coupling in some power of A) . Although this 
definition has worked well for the coarsening procedures of discretized scalar elliptic 
differential equations, it is not really effective, and sometimes meaningless, for systems 
lacking sufficient diagonal dominance (including many discretized nonscalar elliptic 
systems). Moreover, even for systems with a fully dominant diagonal (such as the 
Laplacian of a graph), the classical AMG definition may result in wrong coarsening, 
for example, in graphs with nonlocal edges (see example in Sec. 3). 

Instead, we propose to define the "closeness" between two variables exactly, by 
measuring how well their values are correlated at the coarsening stage, namely, fol- 
lowing the LP relaxation sweeps. Since the coarse level is actually applied to the 
residual system, the two variables will be considered close if their errors have nearly 
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the same ratio in all relaxed vectors. We will thus create a sequence of K normalized 
relaxed error vectors x^\ x^ K \ each obtained by relaxing the homogeneous system 
Ax = from some (e.g., random) start and then normalizing the result. We will then 
define the algebraic distance (reciprocal of "closeness" ) between any two variables Xi 
and Xj as 

min, ( f^W-krVf r) VP ' (L3) 

^ k=l ' 

where p > 2 in order to attach larger weights to larger differences (using usually either 
p = 2 or the maximum norm {p — > oo)). This use of rj gives a symmetric measure of 
how well Xi can be interpolated from Xj or vice versa. For the graph Laplacian (and 
other zero-sum A) this can be simplified to a distance defined as 

( E^-^f) 2 ) 172 OT maxlxf-xfl. (1.4) 

More generally, the distance of a node Xi from a subset S of several nodes can 
similarly be defined as the deviation of the best-fitted interpolation from S to Xi, 
where the deviation is the Li norm of the vector of K errors obtained upon applying 
the interpolation to our K normalized relaxed error vectors, and the best-fitted in- 
terpolation is the one having the minimal deviation. (This least-square interpolation 
is the one introduced in bootstrap AMG (BAMG) [9] for the coarse-to-fine explicit 
interpolation.) 

An essential aspect of the "algebraic distance" defined here is that it is a crude 
local distance. It measures meaningful closeness only between neighboring nodes; the 
closer they are the less fuzzy is their measured distance. For nodes that should not 
be considered as neighbors, their algebraic distance just detects the fact that they are 
far apart; its exact value carries no further meaning. The important point is that this 
crude local definition of distance is fast to calculate and is all that is required for the 
coarsening purposes. A similar notion of distance is then similarly calculated at each 
coarser level. 

Indeed, we argue that meaningful distances in a general graph should, in princi- 
ple, be defined (not just calculated) only in such a multiscale fashion. This essential 
viewpoint, and relations to diffusion distances and spectral clustering are discussed in 
Section 5. In particular, we advocate the replacement of spectral methods by AMG- 
like multilevel algorithms, which are both faster and more tunable to define better 
solutions to many fuzzy graph problems (see, for example, [41, 42]). 

The paper is organized as follows. The graph problems we use to demonstrate 
our approach are introduced in Sec 2. The calculation of the "algebraic distance" and 
its use within the multiscale algorithm is described in Section 3. Results of tests are 
summarized in Section 4. Finally, the relations of our approach to diffusion distances 
and spectral clustering are discussed in Section 5. 

2. Notation and problem definitions. Given a weighted graph G = (V, E), 
where V — {1, 2, n} is the set of nodes (vertices) and E is the set of edges. Denote 
by Wij the non-negative weight (coupling) of the undirected edge ij between nodes 
i and j; if ij E, then Wij — 0. We consider as our examples the following two 
optimization problems. 
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2.1. Linear ordering. Let ir be a bijection 

tt: V — > {l,2,...,n} . 

The purpose of linear ordering problems is to minimize some functional over all pos- 
sible permutations tt. The following functional should be minimized for the minimum 
p-sum problem 1 (MpSP): 

a p (G,7r) = J2^M^)-<j)\ P • (2-1) 

ij 

In the generalized form of the problem that emerges during the multilevel solver, each 
vertex i is assigned with a volume (or length), denoted Vi. Given the vector of all 
volumes, v, the task now is to minimize the cost 

a p (G,n,v) = <j p (G,x) = ^Wij\xi - Xj\ v , 

y 

where x% — ^ + ^ fc „tk)<ir(i) Wfe ' that is each vertex is positioned at its center of mass, 
capturing a segment on the real axis that equals its length. The original form of the 
problem is the special case where all the volumes are equal. In particular, we would like 
to concentrate on the minimum linear arrangement (where p — 1) and the minimum 
2-sum problem (M2SP) that were proven to be NP-complete in [23, 24] and whose 
solution can serve as an approximation for many different linear ordering problems 
replacing the spectral approaches [41, 42]. 

2.2. Partitioning. The goal of the 2-partitioning problem is to find a partition 
of V into two disjoint nonempty subsets IT and II2 such that 

minimize ^ w vj , subject to |II fe | < (1 + a) ■ ^ , (fc = l,2) , (2.2) 
ieriijen 2 

where a is a given imbalance factor. 

Graph partitioning is an NP-hard problem [22] used in many fields of computer 
science and engineering. Applications include VLSI design, minimizing the cost of 
data distribution in parallel computing and optimal tasks scheduling. Because of its 
practical importance, many different heuristics (spectral [36], combinatorial [31, 21], 
evolutionist [15], etc.) have been developed to provide an approximation in a reason- 
able (and, one hopes, linear) computational time. However, only the introduction of 
multilevel methods for partitioning [30, 35, 2, 34, 46, 3, 37, 4, 27, 29, 1] has really 
provided a breakthrough in efficiency and quality. 

3. The coarsening algorithm. In the multilevel framework a hierarchy of de- 
creasing size graphs Go, Gi, Gk is constructed. Starting from the given graph, 
Go = G, we create by recursive coarsening the sequence Gi,...,Gfc, then solve the 
coarsest level directly, and uncoarsen the solution back to G. 

In general, the AMG-based coarsening is interpreted as a process of weighted 
aggregation of the graph nodes to define the nodes of the next coarser graph. In 
weighted aggregation each node can be divided into fractions, and different fractions 

We use this definition for simplicity. The usual definition of the functional is <t p (G, tt) = 
(Eii '"ijKM — 7r 0)l p ) 1 ^ p > which yields the same minimization problem. 
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belong to different aggregates. The construction of a coarse graph from a given one 
is divided into three stages. First a subset of the fine nodes is chosen to serve as the 
seeds of the aggregates (the nodes of the coarse graph). Then the rules for aggregation 
are determined, thereby establishing the fraction of each nonseed node belonging to 
each aggregate. Finally, the graph couplings (or edges) between the coarse nodes are 
calculated. The entire coarsening scheme is shown in Algorithm 1. 

The AMG-based multilevel framework for graph optimization problems is dis- 
cussed, for example, in [ ]. In the present work we generalize the coarsening part of 
the AMG-based framework. The problem-dependent solution of the coarsest level and 
the uncoarsening are not changed here. They are fully described in [42] and references 
therein. 

The principal difference between the previous AMG-based coarsening approaches 
[42, 28, 17] and the new relaxation-based approach is the improved measure, the 
algebraic coupling, assigned to each edge, or, more generally, between any two nodes, 
in the graph. The algebraic coupling is the reciprocal of the calculated algebraic 
distance introduced below. 

Algebraic distance and coupling. The need for an improved measure for the 
graph couplings can be explained by observing the graph depicted in Figure 3.1: one 
additional edge ij (connecting nodes i and j) is added to a regular twodimensional 
mesh. While coarsening, nodes i and j clearly should not belong to the same aggregate 
unless their coupling is much stronger than other graph couplings. However, if the 
weight of ij is just somewhat larger than all other graph edges, and if the black dots are 
some of the seeds of the coarse aggregates (chosen by some AMG-based criterion; see, 
for example, Algorithm 2), node i will tend to be aggregated with node j, rather than 
with any of its neighbors. Such a decision will create bad coarse-level approximations 
in many optimization problems (e.g., linear ordering and partitioning). Moreover, at 
the next-coarser levels the approximation may further deteriorate by making similar 
wrong decisions, making the entire neighborhood of i close to j, thereby promoting 
linear arrangements in which many local couplings would unnecessarily become long- 
range ones. To prevent this situation we would like to have a measure that not only 
evaluates the coupling between i and j according to the direct coupling between them 
but also takes into account the contribution of connections between the neighborhoods 
of i and j. That is, if the immediate (graph) neighbors of i are connected to those 
of j, the coupling between i and j should be enhanced, while if i's neighbors are not 
connected to those of j, as in Figure 3.1, a significant weakening of the ij coupling is 
due. This measure will prevent possible errors while coarsening. 

We introduce the notion of algebraic distance, which is based on the same set 
of test vectors (TVs) being used in the bootstrap AMG (BAMG) [9]. The key new 
ingredient of the adaptive BAMG setup is the use of several TVs, collectively repre- 
senting algebraically smooth error, to define the interpolation weights. When a priori 
knowledge of the nature of this error is not available, slightly relaxed random vectors 
are used for this task. A set of some K low-residual TVs {x^}^ =l can first be ob- 
tained by relaxation. Namely, each x^ k > is a result of r fine-level relaxation sweeps on 
the homogeneous equation Ax — 0, starting from a random approximation, where A 
is the Laplacian of the graph. In particular, we have used a small number (usually 
r=10) of Jacobi under relaxation sweeps with uj — 0.5. That is, the new value for 
each x^ k \ k = 1, K (in our tests K = 20) is 

xP EW = (l-u)xW+uxfl c , (3.1) 
5 
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Fig. 3.1. Mesh graph with an additional edge between nodes i and j. The black dots mark some 
of the nodes selected to serve as the seeds of the coarse aggregates; see Algorithm 2. 

where 

x ( ?l c = D- 1 (D-A)xW , (3.2) 

D being the diagonal of A. The algebraic distance from node i to node j is defined 
over the K relaxed TVs by 

d,^ = max |x- fe ' — . (3.3) 

Other definitions, such as 

<*tf=E(*l fc) -*!* ) ) a ( 3 - 4 ) 
fc=i 

are also possible. Hence, only if dij is small may nodes i and j be aggregated into 
the same coarse node. The algebraic coupling between i and j, c^, is defined as the 
reciprocal of dij : 

= I /dkj . (3.5) 



Data: Q , v 
Result: coarse graph 

For every edge ij derive its algebraic distance dij (3.3) or (3.4) and algebraic 
coupling (3.5); 
SelectCoarseNodes(Q , v)\ 

Define the coarse graph using the matrix P in Equation (3.7); 
Algorithm 1: Coarsening scheme 

We return to the example in Figure 3.1 and demonstrate the outcome of Definition 
(3.3) by comparing dij with di* = min{dj S |s a nearest neighbor of i}. We show that 
i will not tend to be connected to j unless Wij equals the sum of i's other couplings. 
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Furthermore, we show that even if i is connected to j as a result of strong Wij, i'a other 
neighbors will not tend to be connected to i as well but will prefer other neighbors; 
hence the neighborhoods of i and j will not tend to be connected to each other. 
Consider Table 3.1. The number K of TVs is given in the leftmost column. The 
number r of Jacobi relaxation sweeps varies from 10 to 100 as shown in the second to 
the left column. Each of the four columns to the right presents the (natural) logarithm 
of dij/di*, averaged over 100 independent runs, for graph couplings w uv — 1 when 
u and v are nearest neighbors, and w^ — 1, 2, 3, or 4 as shown. The numbers in 
parentheses are the corresponding standard deviations. Clearly the strength of the 
coupling between i and j is relatively decreased when measured by the algebraic 
distance. For instance, if the graph coupling between i and j is 1 (as are all other 
couplings in the graph), then after 20 relaxation sweeps (with K = 10) is three 
times bigger than the minimum of the (algebraic distance of the) edges to i's four 
nearest neighbors. Thus, the algebraic coupling between i and j is not the strongest 
coupling of i (not even close to it), and hence it is guaranteed that i and j will not 
belong to the same coarse node. The importance of using more than just 1 TV can be 
seen from the values of the standard deviations: The use of 1 TV results in standard 
deviations similar to the average, which means that ln{dij / d^) has a significant chance 
to become negative, so ij has a significant chance to be the strongest coupling of i. 
With 10 TVs this chance becomes much smaller, at least for w^ < 2. Even with 10 
TVs, however, the chance grows with Wij, becoming more than 50% roughly when 
Wij > 4. Thus, the aggregation of i with j becomes likely. This by itself is fine and 
justified. What we really need to avoid is that entire neighborhoods of i and j will, as 
a result, be aggregated at some coarser level. Hence, it is important to see whether 
the neighbors of i will tend to be aggregated with i (and thus also with j) or will prefer 
their other neighbors. To see that, we calculate the (natural) logarithm of dqi/dgi*, 
where d q i^ — mm{d qs \s a nearest neighbor of q other than i}. As shown in Table 3.2, 
q would rather be aggregated with one of its other-than-i neighbors. For example, for 
K = 10, r = 20 and Wij = 4 out of the 100 runs, in 95 q would have been connected 
with i. The main conclusion is that nodes i and j do not tend to be connected as long 
as Wij is smaller than the sum of all other couplings of i or of j. When the coupling 
is of the same strength, they will be connected about half the time, but then, not less 
important, the neighbors of i (and similarly of j) will not tend to join them but will 
prefer to be connected to other nearest neighbors nodes. Similar results are obtained 
when using (3.4) to calculate dij. 

With the notion of the algebraic coupling in mind, the coarse nodes selection and 
the calculation of the aggregation weights are modified as follows. 

Seed selection. The construction of the set of seeds C and its complement F 
is guided by the principle that each F-node should be "strongly coupled" to C . We 
will include in C nodes with exceptionally large volume or nodes expected (if used as 
seeds) to aggregate around them an exceptionally large total volume of F-nodes. We 
start with C = 0, hence F = V, and then sequentially transfer nodes from F to C, 
as follows. As a measure of how large an aggregate seeded by i € F might grow, we 
define its future volume $i by 



Nodes with future volume larger than v times the average of the ?Vs are first trans- 
ferred to C as most "representative" (in our tests v = 2). The insertion of additional 
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(3.6) 







w 


uv = 1 for (u, v) 


nearest neighbors 


K 


r 


Wij = 1 


Wij = 2 


= 3 


w^ = 4 


1 


10 
20 
50 
100 


2.47(1.51) 
2.74(1.74) 
2.65(1.36) 
3.03(1.72) 


1.88(1.74) 
2.1(1.59) 
1.98(1.76) 
2.14(1.32) 


1.38(1.85) 
1.4(1.26) 
1.92(1.41) 
1.51(1.42) 


1.14(1.69) 
1.44(1.59) 
1.5(1.59) 
1.16(1.78) 


5 


10 
20 
50 
100 


1(0.502) 
1.34(0.442) 
1.68(0.342) 
1.78(0.467) 


0.628(0.416) 
0.825(0.415) 
1.04(0.338) 
1.06(0.392) 


0.24(0.417) 
0.435(0.358) 
0.643(0.306) 
0.743(0.369) 


-0.0484(0.397) 
0.208(0.332) 
0.362(0.296) 
0.396(0.359) 


10 


10 
20 
50 
100 


0.821(0.281) 
1.09(0.268) 
1.49(0.263) 
1.69(0.315) 


0.443(0.294) 
0.624(0.239) 
0.86(0.235) 
1.01(0.275) 


0.022(0.293) 
0.298(0.235) 
0.504(0.226) 
0.572(0.264) 


-0.244(0.313) 
0.0126(0.253) 

0.2(0.204) 
0.285(0.271) 



Table 3.1 

Statistical results (over 100 runs) {or the average (and, in parentheses, the standard deviation) 
of ln(dij I 'dj*) , calculated with K TVs and r Jacobi relaxation sweeps for different relative strengths 

Of Wij. 







W uv 


= 1 for (u, v] 


nearest neighbors 


K 


r 


Wij = 1 


w^ = 2 


w^ = 3 


w^ = 4 


1 


10 

20 
50 
100 


0.975(1.67) 
0.911(1.62) 
1.37(1.77) 
0.897(1.55) 


0.939(1.7) 
1.09(1.31) 
1.14(1.79) 
1.23(1.45) 


1.14(1.63) 
1.02(1.46) 
1.28(1.49) 
1.29(1.53) 


1.07(1.89) 
0.931(1.64) 
1.24(1.45) 
1.31(1.44) 


5 


10 

20 
50 
100 


0.382(0.534) 
0.434(0.444) 
0.498(0.436) 
0.501(0.522) 


0.482(0.428) 
0.472(0.366) 
0.755(0.53) 
0.746(0.544) 


0.416(0.52) 
0.592(0.486) 
0.784(0.526) 
0.812(0.549) 


0.587(0.487) 
0.663(0.458) 
0.813(0.455) 
0.816(0.535) 


10 


10 
20 
50 
100 


0.283(0.312) 
0.362(0.281) 
0.448(0.311) 
0.464(0.377) 


0.299(0.316) 
0.419(0.295) 
0.531(0.35) 
0.682(0.348) 


0.376(0.307) 
0.449(0.288) 
0.672(0.351) 
0.839(0.374) 


0.401(0.357) 
0.441(0.327) 
0.604(0.333) 
0.749(0.39) 



Table 3.2 

Statistical results (over 100 runs) for the average (and, in parentheses, the standard deviation) 
of ln(d q i I d q i t ) (see Figure 3.1), calculated with K TVs and r Jacobi relaxation sweeps for different 
relative strengths of Wij . 



-F-nodes to C depends on a "strength of coupling to C" threshold Q (in our tests 
Q = 0.5), as specified in Algorithm 2. 

Coarse nodes. Each node in the chosen set C becomes the seed of an aggregate 
that will constitute one coarse-level node. Next it is necessary to determine for each 
i € F a list of j € C to which i will belong. Define caliber, I, to be the maximal 
number of C-points allowed in that list. The selection we propose here is based on 
both measures: the graph couplings toy's and the algebraic couplings cy's. Define 
for each i £ F a coarse neighborhood N (i) = {j G C : ij £ E}. Set D to be the 
maximal Cij in N c (i) . Construct a possibly smaller coarse neighborhood by including 
only nodes with strong algebraic coupling N c (i) — {j £ N c (i) : > f3 * D}, we use 
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Data: Q, v 

Result: set of seeds C 

Calculate i?j (3.6) for each i € F, and their average 
C nodes i with $j > v ■ 

forall z G F in descending order of -di do 
if (Y,j£(cnN(i)) c ij/J2j£N(i) c ij) <Q or 

(Eje(cniV(i)) <%/ £jeiv(i) < Q then move * from F to C; 
end 

Algorithm 2: Select CoarseNodes(Q, j/) 



j3 = 0.5. If N c (i) > I, then the final coarse neighborhood N c (i) will include the first 
I largest w^s in N c (i). If iV c (i) < I, then 7V c (i) «- 7V c (i). This construction of the 
coarse neighborhood N c (i) of i 6 F is summarized in Algorithm 3. (In the results 
below we have used only / = 1 and I — 2.) The classical AMG interpolation matrix 
P (of size |V| x |C|) is then defined by 

for i e F, j e N c (i) 

for ieC, j = i ( 37 ) 
otherwise 

Pij represents the fraction of i that will belong to the jth aggregate. 



Data: i, i, j3 

N c (t)^{jeC : ijeE}; 
D = max.^ c ^Cij\ 

N c (i) = {j eN c (i): ClJ >p*D}; 
if I < \N c (i)\ then 
[_ N c (i) <- the I largest uiy's in N c (i); 

if Z > |JV c (i)| then 
L N c (i) <- A> c (z); 

Algorithm 3: The coarse neighborhood N c (i) 

Coarse graph couplings. The coarse couplings are constructed as follows. 
Let I(k) be the ordinal number in the coarse graph of the node that represents the 
aggregate around a seed whose ordinal number at the fine level is k. Following the 
weighted aggregation scheme used in [43], the edge connecting two coarse aggregates, 
p = I(i) and q = /(j), is assigned with the weight w^q arse ^ = Ylk^l PkiWkiPij- The 
volume of the zth coarse aggregate is • VjPji. 

4. Computational results. We demonstrate the power of our new relaxation- 
based coarsening scheme by comparing its experimental results with those of the 
classical AMG-based coarsening for three important NP-hard optimization problems: 
the minimum 2-sum ((2.1) with p = 2), the minimum linear arrangement ((2.1) with 
p = 1), and the minimum 2-partitioning (2.2) problems. In all cases the results are 
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k£N c (i) 



obtained by taking the lightest possible uncoarsening schemes, so that differences due 
to the different coarsening schemes are least blurred. 

We have implemented and tested the new coarsening scheme by using the linear 
ordering packages developed in [41] and in [40] and the Scotch package [35] on a 
Linux machine. The implementation is nonparallcl and has not been optimized. The 
results should be considered only qualitatively and can certainly be improved by more 
advanced uncoarsening. Thus, no intensive attempt to achieve the best-known results 
for the particular test sets was done. The details regarding the uncoarsening schemes 
for the above problems are given in [41, 40, 17]. 

4.1. The minimum p-sum problem. We present the numerical comparison for 
two minimum p-sum problems: the minimum 2-sum problem and the minimum linear 
arrangement. For these problems we have designed a full relaxation-based coarsening 
solver and evaluated it on a test set of 150 graphs of different nature, size (\V\ < 5- 10 6 
and < 10 7 ) and properties. The test graphs are taken from [19] and from real- 
life network data such as social networks, power grids, and peer-to-peer connections. 
Our solvers are free and can be downloaded with detailed solutions for every graph 
from [ ]. To emphasize the difference in the minimization results between the two 
coarsening schemes (the relaxation- based and the classical AMG-based schemes), we 
measure the results obtained at the end of the multilevel cycle before the final local 
optimization postprocessing (Gauss-Seidel relaxation and the local processing in [41, 
40]), as well as after its application. Moreover, we use small calibers, I = 1,2, since 
these demonstrate more sharply the quality of matching between the F-points and the 
C-points. For higher calibers it is also important to use the adaptive BAMG scheme 
[ )] for calculating the interpolation weights, which is beyond the scope of this work. 
Small calibers are important for maintaining the low complexity of the multilevel 
framework, which is vital, for example, for hypergraphs and expanders. 

The minimum 2-sum problem (M2SP). A comparison of the relaxation- 
based and AMG-based coarsenings with calibers 1 and 2 is presented in Figures 4.1(a) 
and 4.1(b), respectively. Each x-axis scale division corresponds to one graph from the 
test set. The y-axis corresponds to the ratio between the average cost obtained by 
100 runs of the AMG-based coarsening and the one obtained by 100 runs of the 
relaxation-based coarsening. Each figure contains two curves: the dashed curves with 
cost measurements before applying the postprocessing of local optimization (e.g., 
Gauss-Seidel relaxation, Window Minimization [41]) and the regular curves with cost 
measurements after adding such optimization steps. Clearly most graphs benefit 
from the relaxation-based coarsening, showing a ratio greater than 1. The ratio 
decreases when more optimization is used, especially since the Gauss-Seidel relaxation 
is powerful algorithmic component for this problem and thus brings the results of the 
two coarsening schemes closer to each other as is indicated by the regular curves. All 
the these results were obtained with K = 10 TVs. When we lowered K to 5, we 
observed no significant change in the results. Our number of Jacobi overrelaxation 
sweeps r = 20 cannot be reduced by more than twice since this relaxation scheme 
is expected to evolve slowly. The detailed analysis of the convergence properties are 
presented in [16]. 

The minimum linear arrangement problem. Similarly to the previous prob- 
lem, we designed a relaxation-based solver and established a series of experiments for 
the minimum linear arrangement problem. The experimental setup was identical 
to that of the M2SP. It was based on the solver designed in [40]. In this case we 
can observe even more significant improvement when employing the relaxation-based 
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Fig. 4.1. Results for the minimum 2-sum problem. 



coarsening than for the M2SP. 




Fig. 4.2. Results for the minimum 1-sum (linear arrangement) problem. 



Which graphs are most beneficial? It is remarkable that the most beneficial 
graphs in our test set come from VLSI design and general optimization problems. We 
know that these graphs are very irregular (compared, for example, with finite-element 
graphs and with those that pose 2D/3D geometry). Thus, we may conclude that the 
algebraic couplings help to identify the weakness of nonlocal connections and prevent 
them from being aggregated. In several examples, we achieved the best known results 
with caliber 1, while using classical AMG-based approaches they can be achieved with 
bigger calibers only. 

An algebraic coupling-based algorithm. We have also tried a straightforward 
algorithm in which, during coarsening, the weights of the graph are simply replaced by 
their algebraic couplings. That is, in the if statement at the end of Algorithm 2, only 
the first term is taken into account (namely, (J2je(cnN(i)) Ci i '•/ Y^jeN(i) Ci i ) < Q). 
Similarly, in Algorithm 3, Wy (in the first if) is replaced by cy. We present the 
comparison of the obtained simple algebraic couplings based coarsening scheme with 
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the mixed scheme described in Algorithms 2 and 3 in Figure 4.3. The comparison was 
done for the M2SP including postprocessing (of local optimization) using the same 
experimental setup. The bold curve corresponds to the ratios between the classical 
AMG-based results and the simple algebraic coupling-based coarsening scheme. To 
see the difference between this algorithm and the more elaborate one, we add a copy 
of its results, that is, the bold curve from Figure 4.1(a). The mixed version is clearly 
better: in about 25 more graphs the results are improved. The average improvement 
was 1.5%. 




graphs ordered by ratios 

Fig. 4.3. Results for the minimum 2-sum problem. Comparison of the algebraic distance based 
only and mixed full relaxation based algorithms. 

4.2. The minimum 2-partitioning problem. We compared the relaxation- 
based coarsening and the classical AMG-based by combining two packages. The 
coarsening part was the same as in the minimum p-sum problems. The uncoarscning 
was based on the Scotch package; details of its fastest version can be taken from 

[17]. 

The comparison of the relaxation-based and the AMG-based coarsenings with 
caliber 1 is presented in Figure 4.4. The interpretation of x- and y-axes is similar to 
Figure 4.1. Included are 15 graphs of different nature and size. The details regarding 
the numerical results can be obtained from [39]. The four best ratios are obtained 
for graphs with power-law degree distributions. More results for the graph and hy- 
pergraph partitioning problems are reported in [16]. Even though the algorithm used 
there only substitutes the original given couplings by their algebraic couplings, it is 
already clear that better results are observed for most tested instances of both graphs 
and hypergraphs. 

4.3. Running time. The implementation of stationary iterative processes and 
their running time are well studied issues. These topics are beyond the scope of 
this paper; we refer the reader to two books in which one can find discussions about 
sequential and parallel matrix- vector multiplications and general relaxations [25, 26]. 
Typical running time of an AMG-based framework for linear ordering problems on 
graphs can be found in [40, 41, 42]. The introduction of the algebraic disctance did 
not increase significantly those running time estimations. 

5. Multiscale distance definition and hierarchical organization. As men- 
tioned in the introduction, the algebraic distance defined above is only a crude local 
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1 1 1 1 1 1 1 

graphs ordered by ratios 

Fig. 4.4. Results for the minimum 2-partitioning problem. 

distance, measuring meaningful relative distances only between neighboring nodes 
while also detecting which nodes should not be considered as close neighbors. This 
fuzzy local distance, which can be calculated rapidly, is all we need for coarsening. 
A similar distance is then calculated at each coarser level, thus yielding a multiscale 
definition of distances through the entire graph, where at large distances one defines 
the distances only between (usually large) aggregates of nodes, not between any indi- 
vidual pair of distant nodes. Such multiscale distances are not only far less expensive 
to calculate: we next list several reasons why, in principle, distances in a general large 
graph should be defined better in such a multiscale fashion. 

• At large distances the detailed individual distances (the exact travelling time 
from each house in Baltimore to each house in Boston, say) are usually not 
of interest. 

• The distance in a general graph is a fuzzy notion, whose definition is to a 
certain extent arbitrary. The difference between the two distances of two 
neighboring nodes from a third, far one is much less than the difference be- 
tween various, equally legitimate distance definitions, and also far less than 
the accuracy of the graph data (e.g., its edge weights) and far less than the 
accuracy of solving the equations that define these distances. 

• The most important reason: At different scales different factors should in 
principle enter into the distance definition. In particular, at increasingly 
larger distances, intrinsic properties of increasingly larger aggregates should 
play a progressively more important role. For example, in image segmenta- 
tion, while at the finest level the "closeness" of two neighboring pixels (i.e., 
their chance to belong to the same segment) can be defined by their color 
similarity, at larger scales the closeness of two neighboring patches should be 
defined in terms of the similarity in their average color (which is different 
from the direct color similarity of neighboring pixels along the boundary be- 
tween the patches) and also in terms of similarity in various texture measure 
(color variances, shape moments of subaggregates, average orientation of fine 
embedded edges, etc.) and other aggregative properties [44], [45]. Another 
example: in the problem of identifying clusters in a large set of points in 
R d , at the finest level the distance between data points can simply be their 
Euclidean distance, while at coarser levels the distance between two aggre- 
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gates of points should also take into account similarity in terms of aggregative 
properties, such as density, orientation and dimensionality [32]. 
• The multiscale definition of distance also brings much needed flexibility into 
the way distances at one level are converted into distances at coarser levels. 
For example, in a graph whose finest level consists of face images and their 
similarity scores, if at some coarse level node A is the union of two fine- 
level nodes A x and A 2 , and node B is the union of Bi and B 2 , then the 
coarse weight wab of the edge (A, B) can be defined either as some average 
of WAiBxi wa 1 b 2 i WA 2 Bn and w A2B2 5 or alternatively as the maximum (or L p 
average with large p) of those four weights. The former choice (average) is 
more suitable if one wants to cluster faces having a similar pose, while the 
latter choice (max or L p ) is more suitable if we need clusters of images each 
belonging to the same person (or, generally, when the clustering should be 
based on transitive similarity). 
An ingenious rigorous definition of distances in a general graph, introduced in 
[18], is called diffusion distance. Denoting by p(t,y\x) the probability of a random 
walk on the graph starting at x to reach y after t steps, the diffusion distance between 
two nodes Xi and Xj is defined by 

d(xi,Xj,t) 2 = ^2w(y)[p(t,y\x t ) ~p(t,y\xj)} 2 , (5.1) 
y 

with some suitable choice of the node weights w. This is, in fact, a multiscale definition 
of distance, with the diffusion time t serving as the scaling parameter. And indeed 
the definition is used for hierarchical organizations of graphs (even though large- 
scale distances are still defined in detail for any pair of nodes). The calculation 
of our "algebraic distance" can be viewed as just a fast way to compute a crude 
approximation to diffusion distances at some small t. 

The essential practical point is that this crude and inexpensive "algebraic dis- 
tance" is all one needs for solving graph problems by repeated coarsening. The cal- 
culation of the diffusion map (the diffusion distances at various scales t) for a large 
graph is, on the other hand, quite expensive, requiring computing (possibly many) 
eigenvectors of the graph Laplacian. The fast way to calculate them should involve 
using a multiscale algorithm such as AMG (which is likely to work well in those cases 
where hierarchical organizations of the graph is meaningful; the AMG solver can, by 
the way, calculate many eigenvectors for nearly the same work of calculating only one 
[33]). However, instead of calculating the diffusion map and then use it for organizing 
the graph, the AMG structure can itself be used directly, and more efficiently for any 
such organization. 

Indeed, as pointed out in [9], the same coarsening procedures used by the AMG 
solver can directly be used for efficient hierarchical organizations (such as multiscale 
clustering) of a graph (as in [32]) or for multiscale segmentation of an image (as in 
[44], [45]). As exemplified in this article (and also in [41], [42]), this kind of procedures 
can also be used for many other types of graph problems, in particular, it can also be 
used for detecting small hidden cliques in random graphs [ ] . 

Thus, for discrete graphs, and analogously also for related continuum field prob- 
lems, although the diffusion map is a useful theoretical concept, it is often not the 
most practical tool. We believe this to be true for most if not all spectral graph 
methods (using eigenvectors of the graph Laplacian): The same AMG structure that 
would rapidly calculate the eigenvectors can be better used to directly address the 

14 



problem at hand. As pointed out in the discussion of multiscale distances, this can 
yield not just faster solutions but also, and more important, better definitions and 
more tunable treatments for many practical problems. 

6. Conclusions. We have proposed a new measure that quantifies the "close- 
ness" between two nodes in a given graph. The calculation of the measure is linear 
in the number of edges in the graph and involves just a small number of relaxation 
sweeps. The calculated measure is all that is required for coarsening purposes. A 
similar notion of distance is then calculated and used at each coarser level. We 
demonstrate the use of this new measure for the minimum (l,2)-sum linear order- 
ing problem and for the minimum 2-partitioning problem. The improvement in the 
results shows that this measure indeed detects the most important couplings in the 
graph and helps in producing a better coarsening, while at the same time preventing 
nonlocal vertices from belonging to the same coarse aggregate. 
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